EDET: Entity Descriptor Encoder of Transformer for Multi-Modal Knowledge Graph in Scene Parsing

نویسندگان

چکیده

In scene parsing, the model is required to be able process complex multi-modal data such as images and contexts in real scenes, discover their implicit connections from objects existing scene. As a storage method that contains entity information relationship between entities, knowledge graph can well express semantic this paper, new multi-phase was proposed solve parsing tasks; first, used align then graph-based generates results. We also designed an experiment of feature engineering’s validation for deep-learning preliminarily verify effectiveness method. Hence, we representation named Entity Descriptor Encoder Transformer (EDET), which uses both itself its internal attributes representation. This embedded into transformer structure tasks. EDET aggregate results generation image captioning tasks prove has excellent performance fields. Finally, applied industrial scene, confirmed viability our

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scene Graph Parsing as Dependency Parsing

In this paper, we study the problem of parsing structured knowledge graphs from textual descriptions. In particular, we consider the scene graph representation (Johnson et al., 2015) that considers objects together with their attributes and relations: this representation has been proved useful across a variety of vision and language applications. We begin by introducing an alternative but equiv...

متن کامل

Multi-modal Variational Encoder-Decoders

Recent advances in neural variational inference have facilitated efficient training of powerful directed graphical models with continuous latent variables, such as variational autoencoders. However, these models usually assume simple, unimodal priors — such as the multivariate Gaussian distribution — yet many realworld data distributions are highly complex and multi-modal. Examples of complex a...

متن کامل

Multi-Modal Scene Interpretation

The visionary goal of developing an easy to use service robot implies several key tasks such as speech understanding, object recognition and scene understanding. Besides the more sensor-oriented capabilities such systems need extensive meta knowledge, e.g., about mental representations of spatial relations to match the view between man and machine. Only if all parts fit together an unrestricted...

متن کامل

Co-inference for Multi-modal Scene Analysis

We address the problem of understanding scenes from multiple sources of sensor data (e.g., a camera and a laser scanner) in the case where there is no one-to-one correspondence across modalities (e.g., pixels and 3-D points). This is an important scenario that frequently arises in practice not only when two different types of sensors are used, but also when the sensors are not co-located and ha...

متن کامل

Multi-Modal Scene Understanding for Robotic Grasping

Current robotics research is largely driven by the vision of creating an intelligent being that can perform dangerous, difficult or unpopular tasks. These can for example be exploring the surface of planet mars or the bottom of the ocean, maintaining a furnace or assembling a car. They can also be more mundane such as cleaning an apartment or fetching groceries. This vision has been pursued sin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied sciences

سال: 2023

ISSN: ['2076-3417']

DOI: https://doi.org/10.3390/app13127115